Helpful Information
 
 
Category: DOM
DOM->loadHTML parsing problem

Hello guys,

I have a problem with the domdocument parsing of an html file, I was wondering if anyone has any idea on how to solve it.

Here is the original html source code (a snippet):



<IMG SRC="pic1.gif" border=0>&nbsp&nbsp
<A HREF="school.html">School</a>


After I use $domdocument->loadHTML("file.html") then $domdocument->saveHTMLFile("output.html"), the snippet becomes:



<img src="pic2.gif" border="0">&amp;nbsp&amp;nbsp
<a href="work.html">Work</a>


They obviously will render differently and the output page looks different because of the newly placed &amp. I was hoping to get the output to be "exactly" the same.

Thanks much

This is because the HTML parser used is not compatible with the parsers found in browsers; you could try using html5lib (http://code.google.com/p/html5lib/downloads/detail?name=html5lib-php-0.1.tar.gz) to get a DOM, though.

The original original html source code is slightly inaccurate (semicolons missed).

<IMG SRC="pic1.gif" border=0>&nbsp;&nbsp;
<A HREF="school.html">School</a>










privacy (GDPR)